[fix](auth) add auth check for manager node and query qerror REST APIs#65042
Conversation
… query qerror REST APIs
The node management endpoints (POST /rest/v2/manager/node/{action}/{fe,be,broker})
allowed adding or dropping cluster nodes without any authentication or
authorization. Add executeCheckPassword + checkAdminAuth so they require an
authenticated ADMIN user, consistent with set_config/fe and set_config/be.
GET /rest/v2/manager/query/qerror/{id} (getStats) had neither authentication
nor authorization: its signature took no request/response and the global
AuthInterceptor only covers /rest/v1/**, so it was reachable anonymously even
with enable_all_http_auth=true. Add executeCheckPassword and
checkAuthByUserAndQueryId, matching the /profile and /trace_id endpoints, so a
non-admin can only read their own query stats.
Add a p0 regression test covering both gaps.
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
|
Centralize auth into one interceptor — having every v2 endpoint roll its own is a footgun. |
The admin-positive assertions used ADD with 127.0.0.1 addresses, which on a real (distributed) cluster would not match an existing node and would actually register a phantom FE observer / BE into the editlog with no cleanup, polluting cluster state and risking later tests. Switch the positive path to DROP on RFC 5737 TEST-NET addresses (192.0.2.x), which can never match a real node: it reaches the operation, returns a harmless 'does not exist' error, proves the ADMIN check passed, and mutates nothing. The negative (non-admin) cases keep ADD since the auth check rejects them before the node operation runs.
|
run buildall |
FE UT Coverage ReportIncrement line coverage |
TPC-H: Total hot run time: 29592 ms |
TPC-H: Total hot run time: 29946 ms |
TPC-DS: Total hot run time: 175487 ms |
TPC-DS: Total hot run time: 174181 ms |
ClickBench: Total hot run time: 25.5 s |
ClickBench: Total hot run time: 25.44 s |
FE Regression Coverage ReportIncrement line coverage |
|
run nonConcurrent |
|
/review |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
There was a problem hiding this comment.
Automated review summary:
I found two coverage gaps in the new auth regression test. The controller changes themselves follow the existing REST auth helper pattern, but the test does not cover every changed security-sensitive path: broker node mutation is untested, and qerror only covers anonymous rejection rather than the newly added authenticated ownership check.
Critical checkpoints:
- Goal: add auth/authz to manager node mutation endpoints and qerror; implementation does this, but tests do not fully prove all changed paths.
- Scope/focus: code changes are small and focused.
- Concurrency/lifecycle/config: no new concurrency or lifecycle state; behavior depends on the existing mutable enable_all_http_auth config and existing HTTP auth helpers.
- Compatibility: no serialization or storage format change.
- Parallel paths: FE/BE/Broker node operations are parallel; broker coverage is missing.
- Tests: gaps called out inline. I could not run FE build or regression tests in this checkout because thirdparty/installed and thirdparty/installed/bin/protoc are missing.
User focus: no additional user-provided focus points were supplied.
Subagent conclusions: optimizer-rewrite found no candidate; tests-session-config proposed TSC-1, accepted as MAIN-1. The main pass added MAIN-2 for qerror ownership coverage. Convergence round 1 ended with both live subagents replying NO_NEW_VALUABLE_FINDINGS for this same two-comment set.
branch-4.0 does not have #60761 (HTTP API auth framework rework), so ActionAuthorizationInfo has no userIdentity field and checkAdminAuth() does not exist. Use the existing branch-4.0 pattern instead, matching setConfigBe/setConfigFe in the same file.
Proposed changes
Several manager REST APIs under
/rest/v2/managerwere missing authentication and/or authorization. This PR closes those gaps.1. Node management endpoints — missing auth + authz
POST /rest/v2/manager/node/{action}/fe,/{action}/be,/{action}/broker(operateFrontends/operateBackend/operateBroker) could add or drop FE / BE / Broker nodes without any authentication or authorization. Any caller able to reach the FE HTTP port could change cluster topology.Added, consistent with the sibling
set_config/feandset_config/beendpoints:2.
GET /rest/v2/manager/query/qerror/{id}(getStats) — fully unauthenticatedThis endpoint had neither authentication nor authorization: its method signature didn't even take
HttpServletRequest/HttpServletResponse, so it could not callexecuteCheckPassword, and the globalAuthInterceptoronly covers/rest/v1/**. As a result it was reachable anonymously even withenable_all_http_auth=true, leaking per-query stats-error information.Aligned it with the
/profileand/trace_idendpoints — authenticate, then restrict non-admin users to their own queries:Test
Added
regression-test/suites/auth_p0/test_http_node_action_auth.groovy(p0,auth,nonConcurrent):ADD /feandADD /beis rejected;grant 'admin', the request passes the auth check;/qerror/{id}is rejected.FE compiles cleanly (
build.sh --fe).